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Abstract 

Linear programming (LP) decoding, originally proposed by Feldman et al. [4] as an approximation 
to the maximum-likelihood (ML) decoding of binary linear codes, solves a linear optimization problem 
formed by relaxing each of the finite-field parity-check constraints into a number of linear constraints. 
While providing a number of advantages over iterative message-passing (IMP) decoders, such as its 
amenability to finite-length performance analysis, LP decoding is computationally more complex to 
implement in its original form than IMP decoding, due to both the large size of the relaxed LP problem 
and the inefficiency of using general-purpose LP solvers. 

This paper explores ideas for fast LP decoding of low-density parity-check (LDPC) codes. We first 
show a number of properties of the LP decoder, and by modifying the previously reported Adaptive 
LP decoding scheme [9] to allow removal of unnecessary constraints, we prove that LP decoding can 
be performed by solving a number of LP problems that contain at most one linear constraint derived 
from each of the parity-check constraints. Then, as a step toward designing an efficient LP solver that 
takes advantage of the particular structure of LDPC codes, we study a sparse interior-point method for 
solving this sequence of linear programs. Since the most complex part of each iteration of the interior- 
point algorithm is the solution of a (usually ill-conditioned) system of linear equations for finding the 
step direction, we propose a preconditioning algorithm to be used with the preconditioned conjugate- 
gradient method for solving such systems. The proposed preconditioning algorithm is similar to the 
encoding procedure of LDPC codes, and we demonstrate its effectiveness via both analytical methods 
and computer simulation results. 
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Efficient Implementation of Linear 
Programming Decoding 

I. Introduction 

Low-density parity-check (LDPC) codes [1] are becoming one of the dominant means of error- 
control coding in the transmission and storage of digital information. By combining randomness and 
sparsity, LDPC codes with large block lengths can correct errors using iterative message-passing (IMP) 
algorithms at coding rates that are closer to the capacity than any other class of practical codes [2]. 
While the performance of IMP decoders for the asymptotic case of infinite lengths is studied extensively 
using probabilistic methods such as density evolution [3], the finite-length behavior of these algorithms, 
especially their error floors, are still not well-characterized. 

Linear programming (LP) decoding was proposed by Feldman et al. [4] as an alternative to IMP 
decoding of LDPC and turbo-hke codes. LP decoding approximates the maximum-hkehhood (ML) 
decoding problem by a hnear optimization problem via a relaxation of each of the finite-field parity- 
check constraints of the ML decoding into a number of linear constraints. Many observations suggest 
similarities between the performance of LP and iterative message-passing decoding methods [4], [5], [6]. 
In fact, the sum-product message-passing algorithm can be interpreted as a minimization of a nonlinear 
function, known as Bethe free energy, over the same feasible region as LP decoding [7], [8]. 

Due to its geometric structure, LP decoding seems to be more amenable than IMP decoding to finite- 
length analysis. In particular, the finite-length behavior of LP decoding can be completely characterized 
in terms of pseudocodewords, which are the vertices of the feasible space of the corresponding hnear 
program. Another characteristic of LP decoding - the ML certificate property - is that its failure to 
find an ML codeword is always detectable. More specifically, the decoder always gives either an ML 
codeword or a nonintegral pseudocodeword as the solution. On the other hand, the main disadvantage of 
LP decoding is its higher complexity compared to IMP decoding. 

In order to make linear programming (LP) decoding practical, it is necessary to find efficient imple- 
mentations that make its time complexity comparable to those of the message-passing algorithms. A 
conventional implementation of LP decoding is highly complex due to two main factors: (1) the large 
size of the LP problem formed by relaxation, and (2) the inability of general-purpose LP solvers to solve 
the LP efficiently by taking advantage of the properties of the decoding problem. 
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The standard formulation of LP decoding [4] has a size that grows very rapidly with the density of the 
Tanner graph representation of the code. Adaptive LP (ALP) decoding was proposed in [9] to address 
this problem, reducing LP decoding to solving a sequence of much smaller LP problems. The size of 
these LP problems has been observed in practice to be independent of the degree distribution, and more 
specifically, less than a small factor (less than two) times the number of parity checks. However, this 
observation has not been analytically explained. 

More recently, an equivalent formulation of the LP decoding problem was proposed in [11] and [12], 
with a problem size growing linearly with both the code length and the maximum check node degrees. 
While this formulation requires solving only one LP, the overall complexity of this method in practice 
remains substantially higher than that of ALP decoding. 

In this paper, we take some steps toward designing efficient LP solvers for LP decoding that exploit 
the inherent sparsity and structure of this particular class of problems. Our approach is based on a sparse 
implementation of interior-point algorithms. In an independent work, Vontobel studied the implementa- 
tion and convergence of interior-point methods for LP decoding and mentioned a number of potential 
approaches to reduce its complexity [13]. It is also worth noting that a different Une of work in this 
direction has been to apply iterative methods based on message-passing, instead of general LP solvers, 
to perform the optimization for LP decoding; e.g. see [8] and [14]. 

We first propose two modified versions of ALP decoding. The main idea behind these modifications 
is to adaptively remove a number of constraints at each iteration of ALP decoding, while adding new 
constraints to the problem. We prove a number of properties of these algorithms, which facilitate the design 
of a low-complexity LP solver. In particular, we show that the modified ALP decoders have the single- 
constraint property, which means that they perform LP decoding by solving a series of Unear programs 
that each contain at most one Unear constraint from each parity check. An important consequence of this 
property is that the constraint matrices of the Unear programs that are solved have a structure similar, in 
terms of the locations of their nonzero entries, to that of the parity-check matrix. 

Then, we focus on the most complex part of each iteration of the interior-point algorithm, which is 
solving a system of linear equations to compute the Newton step. Since these linear systems become 
ill-conditioned as the interior-point algorithm approaches the solution, iterative methods that are often 
used for solving sparse systems, such as the conjugate-gradient (CG) method, perform poorly in the later 
iterations of the optimization. To address this problem, we propose a criterion for designing precondi- 
tioners that take advantage of the properties of LP decoding, along with a number of greedy algorithms 
to search for such preconditioners. The proposed preconditioning algorithms have similarities to the 
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encoding procedure of LDPC codes, and we demonstrate their effectiveness via both analytical methods 
and computer simulation results. 

The rest of this paper is organized as follows. In Section 11, we review codes, LP decoding, and ALP 
decoding. In Section m, we propose some modifications in ALP decoding, and demonstrate a number 
of properties of ALP decoding and its variations. In Section IV, we review a class of the interior-point 
linear programming methods, as well as the preconditioned conjugate gradient (PCG) method for solving 
linear systems, with an emphasis on sparse implementation. In Section V, we introduce the proposed 
preconditioning algorithms to improve the PCG method for LP decoding. Some theoretical analysis and 
computer simulation results are presented in Section VI, and some concluding remarks are given in 
Section VII. 

II. LP Decoding 

A. Notation 

Throughout the paper, we denote scalars and column vectors by lower-case letters (a), matrices by 
upper-case letters (A), and sets by calligraphic upper-case letters (.4). We write the zth element of a 
vector a and the (i,j)th element of a matrix A as a, and Aij, respectively. The cardinahty (size) of a 
finite set A is shown by |.A|. The support set (or briefly, support) of a vector a of length n is the set of 
locations i € {1, . . . , n} such that 7^ 0. Similarly, the fractional support of a vector a G M" is the set 
of locations z G {1, . . . , n} such that ^ Z. 

A binary hnear code C of block length ra is a subspace of {0, 1}". This supspace can be defined as 
the null space (kernel) of a parity-check matrix H G {0, in modulo-2 arithmetic. In other words, 

C={ue{0,iy'\Hx = mod 2}. (1) 

Hence, each row of H corresponds to a binary parity-check constraint. The design rate of this code is 
defined as i? = 1 — ^. In this paper, we assume that H has full row rank (mod 2), in which case the 
design rate is the same as the rate of the code. 

Given the m x n parity-check matrix, H, the code can also be described by a Taimer graph. The Taimer 
graph T is a bipartite graph containing n variable nodes (corresponding to the columns of H) and m 
check nodes (corresponding to the rows of H). We denote by I = {1, . . . ,n} the set of (indices of) 
variable nodes, and by ^ = {1, . . . , m} the set of (indices of) check nodes. Variable node i is connected 
to check node j via an edge in the Tanner graph if Hj^i = 1. 
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The neighborhood AA(j) of a check (variable) node j is the set of variable (check) nodes it is directly 
connected to via an edge, i.e., the support set of the j'th row (column) of H. The degree dj of a node 
j, where the type of the node will be clear from the context, is the cardinahty of its neighborhood. Let 
(S C J be a subset of the variable nodes. We call S a stopping set if there is no check node in the graph 
that has exactly one neighbor in S. Stopping sets characterize the termination of a belief propagation 
erasure decoder. 

Each code can be equivalently represented by many different parity-check matrices and Taimer graphs. 
However, it is important to note that the performance of suboptimal decoders, such as message-passing or 
LP decoding, may depend on the particular choice of H and T. A low-density parity-check (LDPC) code 
is a linear code which has at least one sparse Tanner graph representation, where the average variable 
node and check node degrees do not grow with n or m. 

A Unear program (LP)^ of dimension n is an optimization problem with a linear objective function 
and a feasible set (space) described by a number of linear constraints (inequalities or equations) in terms 
of n real-valued variables. Each linear constraint in the LP defines a hyperplane in n-dimensional space. 
If the solution to an LP is bounded and unique, then it is at a vertex v of the feasible space, on the 
intersection of at least n such hyperplanes. Conversely, for any vertex v of the feasible space of an LP, 
there exists a choice of the coefficients of the objective function such that v is the unique solution to the 
LP 

B. LP Relaxation of Maximum-Likelihood Decoding 

Consider a binary Unear code C of length n. If a codeword v & C is transmitted through a memoryless 
binary-input output-symmetric (MBIOS) chaimel, the ML codeword given the received vector r G 
M" is the codeword that maximizes the likelihood of observing r, i.e., 

y^ML _ aj-g jjiaxPr[r|tt]. (2) 
ueC 

For binary codes, this problem can be rewritten as the equivalent optimization problem 

minimize 7"^^ 

ML Decoding (3) 

subject to u e C, 

'Throughout the paper, we abbreviate the terms "Unear program" and "Unear programming" both as "LP". 
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where 7 is the vector of log-likelihood ratios (LLR) defined as 

^ Pr {ri\ui = 0) 
Fr{ri\ui = 1)' 

The ML decoding problem (3) is an optimization with a linear objective function in the real domain, 
but with constraints that are nonlinear in the real space (although, hnear in modulo-2 arithmetic). It 
is desirable to replace these constraints by a number of linear constraints, such that decoding can be 
performed using linear programming. The feasible space of the desired LP would be the convex hull of 
all the codewords in C, which is called the codeword poly tope. Since a global minimum occurs at one 
of the vertices of the polytope, using this feasible space makes the set of potential (unique) solutions to 
the LP identical to the set of codewords in C. Unfortunately, the number of constraints needed for this 
LP representation grows exponentially with the code length, therefore making this approach impractical. 
As an approximation to ML decoding, Feldman et al. proposed a relaxed version of this problem by first 
considering the convex hull of the local codewords defined by each row of the parity-check matrix, and 
then intersecting them to obtain what is known as the fundamental polytope, V [6]. 

To describe the (projected) fundamental polytope, linear constraints are derived from a parity-check 
matrix as follows. For each row j = 1, . . . , m of the parity-check matrix, i.e., each check node, the LP 
formulation includes the constraints 

^Ui- ^ < I V| - 1, V F C such that | V| is odd, (5) 

iev ieM{j)\v 

which can be written in the equivalent form 

^(1 -Ui)+ J2 Ui>l, yv C such that |V| is odd. (6) 

iev ieM{j)\v 

We refer to the constraints of this form as parity inequalities. If the variables Ui are zeroes and ones, 
these constraints will be equivalent to the original binary parity-check constraints. To see this, note that 
if V is a subset of with |V| odd, and the corresponding parity inequaUty fails to hold, then all 

variable nodes in V must have the value 1, while those in J\f{j)\V must have the value 0. This implies 
that the corresponding vector u does not satisfy parity check j. Conversely, if parity check j fails to 
hold, there must be a subset of variable nodes V C of odd size such that all nodes in V have the 
value 1 and all those in M{j)\V have the value 0. Clearly, the corresponding parity inequality would 
be violated. Now, given this equivalence, we relax the LP problem by replacing each binary constraint, 
Ui G {0, 1}, by a box constraint, < ttj < 1. LP decoding can then be written as 
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mimmize 



T 

7 u 



LP Decoding 



(7) 



subject to u gV. 



Lemma 1 ([4], originally by [15]): For any check node j, the set of parity inequahties (5) defines the 
convex hull of all — 1 assignments of the variables with indices in J\f{j) that satisfy the jth binary 
parity-check constraint. 

Since the convex hull of a set of vectors in [0, l]'^ is a subset of [0, 1]*^, the set of parity inequahties for 
each check node automatically restrict all the involved variables to the interval [0, 1]. Hence, we obtain 
the following corollary: 

Corollary 1: In the formulation of LP decoding above, the box constraints for variables that are 
involved in at least one parity-check constraint are redundant. 

The fundamental polytope has a number of integral (binary-valued) and nonintegral (fractional-valued) 
vertices. The integral vertices, which satisfy all the parity-check equations as shown before, exactly 
correspond to the codewords of C. Therefore, the LP relaxation has the ML certificate property, i.e., 
whenever LP decoding gives an integral solution, it is guaranteed to be an ML codeword. On the 
other hand, if LP decoding gives as the solution one of the nonintegral vertices, which are known 
as pseudocodewords, the decoder declares a failure. 

C. Adaptive Linear Programming Decoding 

In the original formulation of Feldman et al. for LP decoding, the number of parity inequalities for 
each check node of degree dj is equal to the number of odd-sized subsets of its neighborhood, which 
is equal to 2'^'~^. Even for parity-check matrices of moderate row weights, this number can be very 
large. In [9] a cutting-plane algorithm was proposed as an alternative to the direct implementation of 
LP decoding (7). In this method, referred to as "adaptive LP decoding" (ALP decoding), a hierarchy 
of hnear programs with the same objective function as in (7) are solved, with the solution to the last 
program being identical to that of LP decoding. The first linear program in this hierarchy is made up of 
only n box constraints, such that for each i G {1, 2, . . . , n}, we include the constraint 



The solution to this initial problem corresponds to the result of an (uncoded) bit-wise hard decision based 
on the received vector. 




(8) 



Ui <l if 7j < 0. 
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Algorithm 1 ALP Decoding 

1: Setup the initial LP problem with constraints from (8), and <— 0; 

2: Find the solution u'^ to the initial LP problem by bit-wise hard decision; 

3: repeat 

4: k + 1; 

5: Find the set S'' of all parity inequalities and box constraints that are violated at u'^"^; 
6: If liS'^l > 0, add the constraints in S'' to the LP problem and solve it to obtain u'^; 

7: until IcS^I = 

8: Output u = as the solution to LP decoding. 



The adaptive LP decoding algorithm is presented here as Algorithm 1 (ALP decoding). In Step 5 of 
this algorithm, the search for all the violated parity inequalities can be performed using Algorithm 1 
of [9] in 0{J2^i dj logdj) = 0{mdmax logc^maa;) time, without having to examine all the 0(m2'^"'<'»') 
parity inequahties given by the original LP decoding formulation. Furthermore, based on observations, 
it is conjectured in [10] that there is no need to check for violated box constraints in Step 5, since they 
cannot be violated at any of the intermediate solutions of ALP decoding. In the next section, we 
present a proof of this conjecture. 

In [9], the number of iterations of ALP decoding was upper-bounded by the code length, n. However, 
it was observed in the simulations that the typical number of iterations is much smaller in practice (less 
than 20 for all n < 2000). Moreover, one can conclude from the following theorem that, at each iteration 
of ALP decoding, the number of violated parity inequalities added to the problem is at most m, where 
m is the number of check nodes. 

Theorem 1 ([10]): If at any given point u G [0, 1]", one of the parity inequahties introduced by a 
check node j is violated, the rest of the parity inequahties from this check node are satisfied with strict 
inequality. 

III. Properties and Variations of ALP decoding 

In this section, we prove some properties of LP and ALP decoding, and propose some modifications to 
the ALP algorithm. As we will see, many of the elegant properties of these algorithms are consequences 
of Theorem 1 . 

First, we propose an alternative to using Algorithm 1 of [9] for finding all the violated parity inequahties 
at any given point u G [0, 1]". Consider the general form of parity inequahties in (6) for a given check 
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node j, and note that at most one of these inequaUties can be violated at u. To find this inequality, if it 
exists, we need to find an odd-sized V C that minimizes the left-hand side of (6). If there were no 

requirement that |V| is odd, the left-hand side expression would be minimized by putting any i G 
with Uj > 5 in V. However, if such V has an even cardinality, we need to select one element i* of 
to add to or remove from V, such that the increase on the left-hand side of (6) is minimal. This 
means that i* is the element whose corresponding value Uj. is closest to ^. This results in Algorithm 2, 
which has 0{dj) complexity for check node j, thus reducing the complexity of finding all the m parity 
inequaUties from O {Y."Lidj log dj) with Algorithm 1 of [9] to 0{Y,]Lidj) = 0{E), where E is the 
total number of edges in the Tanner graph. 



Algorithm 2 Find the Violated Parity Inequahty from Check Node j at u 



1: 


S^{ieM{3)\ui > 


2: 


if \S\ is odd then 


3: 




4: 


else 


5: 


i* ^ arg minig^(j-) \ui - i|; 


6: 


V ^ S\{i*} if i* G S; otherwise V ^SU {i*}; 


7: 


end if 


8: 


if (6) is satisfied at u for this j and V then 


9: 


Check node j does not introduce a violated parity inequality at u; 


10: 


else 


11: 


We have found the violated parity inequahty from check node j; 


12: 


end if 



A. Modified ALP Decoding 

Definition 1: A linear inequahty constraint of the form a^x <b is called active at point if it holds 
with equality; i.e., a^x^ = b, and is called inactive if it holds with strict inequality; i.e. a^x^ < b. 
The following is a corollary of Theorem 1 

Corollary 2: If one of the parity inequaUties introduced by a check node is active at a point x^ G [0, 1]", 
all parity inequaUties from this check node must be satisfied at 

Corollary 2 can be used to simpUfy Step 5 of ALP decoding (Algorithm 1) as follows. We first find 
the parity inequaUties currently in the problem that are active at the current solution, u'^. This can be 
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done simply by checking if the slack variable corresponding to a constraint is zero. Then, in the search 
for violated constraints, we exclude the check nodes that introduce these active inequalities. 

Now consider the linear program LP^ at an iteration k of ALP decoding, with an optimum point u^. 
This point is the vertex (apex) of the n-dimensional cone formed by all hyperplanes corresponding to the 
active constraints. It is easy to see that among the constraints in this linear program, the inactive ones 
are non-binding, meaning that, if we remove the inactive constraints from the problem, remains an 
optimum point of the feasible space. This fact motivates a modification in the ALP decoding algorithm, 
where, after solving each LP, a subset of the constraints that are active at the solution are removed. 

By combining the two ideas proposed above, we obtain the modified ALP decoding algorithm A 
(MALP-A decoding), stated in Algorithm 3. It was conjectured in [10] that no box constraint can be 
violated at any intermediate solution of ALP decoding. We will prove this conjecture for both ALP and 
MALP decoding in this section. Hence, we do not search for violated box constraints in the intermediate 
iterations of the proposed algorithms. 

Algorithm 3 MALP-A Decoding 
1: Setup the initial LP problem with constraints from (8), and A; ^ 0; 

2: Find the solution to the initial LP problem by bit-wise hard decision; 

3: repeat 

4: k + 1; flag ^ 0; 

5: for _7 = 1 to m do 

6: if there is no active parity inequality from check node j in the problem then 

7: if check node j introduces a parity inequality that is violated at u'^~^ then 

8: Remove the parity inequalities of check node j (if any) from the current problem; 

9: Add the new (violated) constraint to the LP problem; flag <— 1; 

10: end if 

11: end if 

12: end for 

13: If flag = 1, solve the LP problem to obtain u''; 

14: until flag = 

15: Output II = as Ihe solulion lo LP decoding. 



Checking the condition in line 7 can be done using Algorithm 2 in 0{dj) time, where dj is the degree 
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of check node j, and the role of the if-then structure of Une 6 is to Umit this processing to only check 
nodes that are not currently represented in the problem by an active constraint. In line 8, before adding a 
new constraint from check node j to the problem, any existing (inactive) constraint is removed from the 
problem. Alternatively, we can move this command to line 6; i.e. remove all the inactive constraints in the 
problem. We call the resulting algorithm the modified ALP decoding algorithm B (MALP-B decoding). 

The LP problems solved in the ALP and modified ALP decoding algorithms can be written in the 
"standard" matrix form as 

minimize 

subject to Au < b 

Ui>0 

Ui<l 

where matrix A is called the constraint matrix. 
B. Properties 

In Theorem 2 of [9], it has been shown that the sequence of solutions to the intermediate LP problems 
in ALP decoding converges to that of LP decoding in at most n iterations. In the following theorem, 
in addition to proving that this property holds for the two modified ALP decoding algorithms, we show 
three additional properties shared by all three variations of adaptive LP decoding. 

We assume that the optimum solutions to all the LP problems in the intermediate iterations of either 
ALP, MALP-A, or MALP-B decoding are unique. However, one can see that this uniqueness assumption 
is not very restrictive, since it holds with high probability if the channel output has a finite probability 
density function (pdf). Moreover, channels that do not satisfy this property, such as the binary symmetric 
channel (BSC), can be modified to do so by adding a very small continuous-domain noise to their output 
(or LLR vector). 

Theorem 2 (Properties of adaptive LP decoding): Let u^,u^,. . . be the unique solutions to the 
sequence of LP problems, LP°, LP^, . . . , LP^ , solved in either ALP, MALP-A, or MALP-B decoding 
algorithms. Then, the following properties hold for all three algorithms: 

a) The sequence of solutions vP,v}^. . . satisfy all the box constraints 0<'Ui<l, Vi = l,...,n. 

b) The costs of these solutions monotonically increase with the iteration number; i.e., 

7^^x° < -i'^v} <... (10) 



(9) 



V i G X : 7i > 0, 
y iel: 7i < 0, 
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c) u^,u , . . . converge to the solution of LP decoding, u*, in at most n iterations. 

d) Consider the set of parity inequahties included in LP'' which are active at its optimum solution, u'^. 
Let J"*^ = {ji,j2, ■ ■ ■ be the set of indices of check nodes that generate these inequalities. 
Then, u'^ is the solution to an LP decoding problem LPD^ with the LLR vector 7 and the Taimer 
graph corresponding to the check nodes in J^. 

The proof of this theorem is given in Appendix L 

The following theorem shows an interesting property of the modified ALP decoding schemes, which 
we call the "single-constraint property." This property does not hold for ALP decoding. 

Theorem 3: In the LP problem at any iteration k of the MALP-A and MALP-B decoding algorithms, 
there is at most one parity inequality corresponding to each check node of the Tanner graph. 

Proof: [By induction] The initial LP problem consists only of box constraints. So, it suffices to show 
that, if the LP problem LP'' at an iteration k satisfies the desired property, the LP problem LP'^^^ in the 
subsequent iteration satisfies this property, as well. Consider check node j which has a violated parity 
inequality Kj at the solution u'^ of LP'^. According to Corollary 2, if there already has been a parity 
inequality kj from this check node in LP'^, kj caimot be active at u'^, hence, the MALP decoder will 
remove kj before adding K,j to LP'^^^. As a result, there caimot be more than one parity inequality from 
any check node j in LP'^"'"^ ■ 

Corollary 3: The number of parity inequalities in any linear program solved by the MALP decoding 
algorithms is at most m 

The result above is in contrast to the non-adaptive formulations of LP decoding, where the size of the 
LP problems grows with the check node degree. Consequently, the complexity of these two algorithms 
can be bounded by their number of iterations times the worst-case complexity of solving an LP problem 
with n variables and m parity inequahties. Therefore, an interesting problem to investigate is how the 
number of iterations of the MALP decoding algorithms varies with the code parameters, such as the 
length or the check node degrees, and how its behavior changes depending on whether the LP decoding 
output is integral or fractional. In Subsection HI-D, we present some simulation results, studying and 
comparing ALP decoding and its modifications in terms of the number of iterations. 

An important consequence of Theorem 3 is that, in the LP problems that are solved by these two 
algorithms, the distribution of the nonzero elements of the LP constraint matrix, A, has the same structure 
as that of the parity-check matrix, H, after removing the rows of H that are not represented by a parity 
inequality in the LP. This is due to the fact that the support set of a row of A, corresponding to a parity 
inequality, is identical to that of the row of H from which it has been derived, and in addition, each 
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row of A is derived from a unique row of H. As we will see later in this paper, this property, which is 
not shared by LP or ALP decoding, maintains the same desirable combinatorial properties (e.g., degree 
distribution) for A that the H matrix has. This can be exploited in the design of efficient LP solvers. 

Remember that the LP problem in the last iteration of the MALP decoding algorithms has the same 
solution as standard LP decoding. This solution is a vertex of the feasible set, defined by at least n 
active inequalities from this LP problem. Hence, using Corollary 3, we conclude that at least n — m box 
constraints are active at the solution of LP decoding. This yields the following properties of LP decoding. 

Corollary 4: The solution to any LP decoding problem differs in at most n — m coordinates from the 
vector obtained by making bit-based hard decisions on the LLR vector 7. 

Corollary 5: Each pseudocodeword of LP decoding has at most m fractional entries. 

Remark 1: This bound on the size of the fractional support of pseudocodewords is tight in the sense 
that there are LP decoding relaxations which have pseudocodewords with exactly m fractional entries. 
An example is the pseudocodeword [1, ^, 0, ^, 0, 0, ^] of the (7, 4, 3) code with m = 3, given in [4]. 

C. Connection to Erasure Decoding 

For the binary erasure chaimel (BEC), the performance of belief propagation (BP), or its equivalent, 
the peeling algorithm, has been extensively studied. The peeling algorithm can be seen as performing 
row and column permutations to triangularize a submatrix of H consisting of the columns corresponding 
to the erased bits. It is known that the BP and peeling decoders succeed on the BEC if and only if the 
set of erased bits does not contain a stopping set. 

Feldman et al. have shown in [4] that LP decoding and BP decoding are equivalent on the BEC. 
In other words, the success or failure of LP decoding can also be explained by stopping sets. In this 
subsection, we show a coimection between LP decoding on the BEC and LP decoding on general MBIOS 
chaimels, allowing us to derive a sufficient condition for the failure of LP decoding on general MBIOS 
chaimels based on the existence of stopping sets. 

Theorem 4: Consider an LP decoding problem LPD^ with LLR vector ^, -fi ^ i ^1, resulting 
in the unique integral solution (i.e., the ML codeword) u. Also, let u be the result of bit-based hard 
decisions on 7; i.e., ui = if 7^ > 0, and Ui = 1 otherwise. Then, the set C X of positions where u 
and u differ, does not contain a stopping set. 

Proof: Let's assume, without loss of generality, that u is the vector of all-zeroes, in which case we 
will have 

f = {zGX|7i<0}. (11) 
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We form an LP erasure decoding problem LPD^^^ with u as the transmitted codeword and £ as the 
set of erased positions. LPD^^'^ has the same feasible space V as LPD^, but has a new LLR vector 
A, defined such that V i G X, 

^ if ieS, 



A,; 



(12) 
1 otherwise, 



Clearly, since V C [0, 1]", we have > 0, ^ v e V. We prove the theorem by showing that the 
all-zeroes vector u is the unique solution to LPD^^'-^, as well. 
Assume that there is another vector v such that we have 

X^y = = 0. (13) 

Combining (12) and (13) yields 

^ = 0, (14) 
implying that Vi = 0, V i G I\£. Therefore, using (11), the cost of the vector v for LPD^ will be 

ie£ 

< = 7^u, (15) 

with equality if and only if w j = 0, V i G X. Since, by assumption, u is the unique solution to LPD^, 
we must have v = u = [0, . . . , 0]^. Hence, u is also the unique solution to LPD^^'- ' . Finally, due to 
the equivalence of LP and BP decodings on the BEC, we conclude that £ does not contain a stopping 
set. ■ 
Theorem 4 will be used later in the paper to design an efficient way to solve the systems of hnear 
equations we encounter in LP decoding. 

D. Simulation Results 

We present simulation results for ALP, MALP-A, and MALP-B decoding of random (3, 6)-regular 
LDPC codes, where the cycles of length four are removed from the Tanner graphs of the codes. The 
simulations are performed in an AWGN channel with the SNR of 2 dB (the threshold of belief-propagation 
decoding for the ensemble of (3, 6)-regular codes is 1.11 dB), and include 8 different lengths, with 1000 
trials at each length. 

In Fig. 1, we have plotted the histograms of the number of iterations using the three algorithms for 
length n = 480. The first column of histograms includes the results of all the decoding instances, while 
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Fig. 1. The histograms of the number of iterations for ALP, MALP-A, and MALP-B decoding for a random (3, 6)-regular 
LDPC code of length 480 at SNR = 2 dB. The left, middle, and right columns respectively correspond to the results of all 
decoding instances, decodings with integral outputs, and decodings with fractional output. 



the second and third columns only include the decoding instances with integral and fractional outputs, 
respectively. From this figure, we can see that when the output is integral (second column), the three 
algorithms have a similar behavior, and they all converge in less that 15 iterations. On the other hand, 
when the output is fractional (third column), the typical numbers of iterations are 2-3 times higher for all 
algorithms, so that we observe two almost non-overlapping peaks in the histograms of the first columns. 

In Fig. 2, the average numbers of iterations of the three algorithms are plotted for both integral and 
fractional decoding outputs versus the code length. As a measure of the deviation of the results from the 
mean, we have also included the 95% one-sided confidence upper bound for each curve, which is defined 
as the smallest number which is higher than at least 95% of the values in the population. We can observe 
that the number of iterations for MALP-A and MALP-B decoding are significantly higher that that of 
ALP when the output is fractional. On the other hand, for decoding instances with integral outputs, where 
the LP decoder is successful in finding the ML codeword, the increase in the number of iterations for 
the modified ALP decoders relative to the ALP decoder is very small. Hence, the MALP decoders pay a 
small price in terms of the number of iterations in exchange for obtaining the single-constraint property. 

Moreover, our simulations indicate that the size of the largest LP that is solved in each MALP-A or 
MALP-B decoding problem is smaller on average than that of ALP decoding by 17% for integral outputs 
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10' 10^ 10^ 

Code Length 

Fig. 2. The number of iterations of ALP, MALP-A, and MALP-B decoding versus code length for random (3, 6)-regular LDPC 
codes at SNR = 2 dB. The solid and dashed curves represent, respectively, the average values and the 95% one-sided confidence 
upper bounds. 



and 30% for fractional outputs. 

IV. Solving the LP Using the Interior Point Method 

General-purpose LP solvers do not take advantage of the particular structure of the optimization 
problems arising in LP decoding, and, therefore, using them can be highly inefficient. In this and the 
next sections, we investigate how LP algorithms can be implemented efficiently for LP decoding. The 
two major techniques for linear optimization used in most applications are Dantzig's simplex algorithm 
[16] and the interior point methods. 

A. Simplex vs. Interior-Point Algorithms 

The simplex algorithm takes advantage of the fact that the solution to an LP is at one of the vertices 
of the feasible polyhedron. Starting from a vertex of the feasible polyhedron, it moves in each iteration 
(pivot) to an adjacent vertex, until an optimal vertex is reached. Each iteration involves selecting an 
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adjacent vertex with a lower cost, and computing tlie size of the step to take in order to move to that 
edge, and these are computed by a number of matrix and vector operations. 

Intertior-point methods generally move along a path within the interior of the feasible region. Starting 
from an interior point, interior point methods approximate the feasible region in each iteration, and take 
a Newton-type step towards the next point, until they get to the optimum point. Computation of these 
steps involves solving a Unear system. 

The complexity of an LP solver is determined by the number of iterations it takes to converge and 
the average complexity of each iteration. The number of iterations of the simplex algorithm has been 
observed to be polynomial (superlinear), on average, in the problem dimension n, while its worst-case 
performance can be exponential. An intuitive way of understanding why the average number of simplex 
pivots to successfully solve an LP decoding problem is at least linear in n is to note that each pivot 
makes one basic primal variable nonbasic (i.e. sets it to zero) and makes one nonbasic variable basic (i.e. 
possibly increases it from zero). Hence, starting from an initial point, it should generally take at least a 
constant times n pivots to arrive at a point corresponding to a binary codeword. Therefore, even if the 
computation of each simplex iteration were done in Unear time, one could not achieve a miming time 
better that O(n^), unless the simplex method is fundamentally revised. 

In contrast to the simplex algorithm, for certain classes of iterior-point methods, such as the path- 
following algorithm, the worst-case number of iterations has been shown to be ©(^/n), although these 
algorithms typically converge in O(logn) iterations [17]. Therefore, if the Newton step at each iteration 
can be computed efficiently, taking advantage of the sparsity and structure in the problem, one could 
obtain an algorithm that is faster than the simplex algorithm for large-scale problems. 

Interior-point methods consist of a variety algorithms, differing in the way the optimization problem 
is approximated by an unconstrained problem, and how the step is calculated at each iteration. One of 
the most successful classes of interior-point methods is the primal-dual path-following algorithm, which 
is most effective for large-scale applications. In the following subsection we present a brief review of 
this algorithm. For a more comprehensive description, we refer the reader to the Uterature on linear 
programming and interior-point methods. 

B. Primal-Dual Path-Following Algorithm 

For simphcity, in this section we assume that the LP problems that we want to solve are of the form 
(9). However, by introducing a number of additional slack variables, we can modify all the expressions 
in a straighforward way to represent the case where both types of box constraints may be present for 
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each variable. 

We first write the LP problem with q variables and p constraints in the "augmented" form 

minimize c^x 

Primal LP subject to Ax = b, (16) 

x>0. 

Here, to convert the LP problem (9) into the form above, we have taken two steps. First, noting that each 
variable ui in (9) is subject to exactly one box constraint of the form Uj > or Ui < 1, we introduce 
the variable vector x and cost vector c, such that for any i = 1, . . . , n, Xi = Ui and Cj = 7j if the 
former inequality is included (i.e., 7^ > 0), and Xi = l — Ui and Cj = — 7^, otherwise. Therefore, the box 
constraints will all have the form Xi > 0, and the coefficients of the parity inequalities will also change 
correspondingly. Second, for any j = 1, . . . ,p, we convert the parity inequality Aj^^x < bj in (9), where 
denotes the jth row of A, to a linear equation Aj^^x + Xn+j = bj , by introducing p noimegative 
slack variables Xn+i, ■ ■ ■ ,Xq, where q = n + p, with corresponding coefficients equal to zero in the cost 
vector, c. We will sometimes refer to the first n (non-slack) variables as the standard variables. The dual 
of the primal LP has the form 

miiumize b^y 

Dual LP subject to A^y + z = c, (17) 

z>0, 

where y and z are the dual standard and slack variables, respectively. 

The first step in solving the primal and dual problems is to remove the inequality constraints by 
introducing logarithmic barrier terms into their objective functions.^ The primal and dual objective 
functions will thus change to c-^.t — /i Y^l^i log Xi and b^y — fi Yli=i respectively, for some fi > 0, 

resulting in a famihy of convex nonlinear barrier problems P{fi), parameterized by /x, that approximate 
the original linear program. Since the logarithmic term forces x and z to remain positive, the solution to 
the barrier problem is feasible for the primal-dual LP, and it can be shown that as n ^ 0, it approaches 
the solution to the LP problem. The key idea of the path-following algorithm is to start with some > 0, 
and reduce it at each iteration, as we take one step to solve the barrier problem. 

^Because of this step, interior-point methods are sometime referred to in the Uterature as barrier methods. 
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The Karush-Kuhn Tucker (KKT) conditions provide necessary and sufficient optimality conditions for 
P{n), and can be written as [17, Chapter 9] 

Ax = h (18) 

= c (19) 

XZe = ne (20) 

x,z>0, (21) 

where X and S are diagonal matrices with the entries of x and z on their diagonal, respectively, and e 
denotes the all-ones vector. If we define 

Ax-b 
F{s) = A^y + Z-C , 
XZe — fxe 

where s = {x, y, z) is the current primal-dual iterate, the problem of solving P(/u) reduces to finding the 
(unique) zero of the multivariate function F{s). In Newton's method, F{s) is iteratively approximated 
by its first order Taylor series expansion around s = 

F{s^ + As'^) « F{s^) + J(s'=)As^ (22) 

where J{s) is the Jacobian matrix of F{s). The Newton direction As'^ = (Ax'^, Ay'^, Az'^) is obtained 
by setting the right-hand side of (22) to zero, resulting in the following system of Unear equations: 

^00 Az^ n 

/ Ay'^ = rc (23) 

Zk Xk\ [Az^\ |_re_ 
where = h— Ax^, Vc = c — A^y'' — z^, and rg = jJ^e — XkZ^e are the residuals of the KKT equations 
(18), and fi'^ is the value of fi at iteration k. If we start from a primal and dual feasible point, we will 
not need to compute rj, and Tc, as they will remain zero throughout the algorithm. However, for sake of 
generality, here we do not make any feasibiUty assumption, in order to have the flexibility to apply the 
equations in the general, possibly infeasible case. 
The solution to the linear system (23) is given by 

{ADlA'^)Ay'' =n + ADlrc - AZ;\^, (24) 

Ax'^ = DlA^/S.y'^ - Dire + Z-\,, (25) 



Az^ = X^\re- ZAx^ 



(26) 
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where 

Dl=XkZ-\ (27) 

To simplify the notation, we will henceforth drop the subscript k from D^, but it should be noted that 
-D is a function of the iteration number, k. Having the Newton direction, the solution is updated as 

^k+i ^^k_^_ f^kj^^k^ 

and the primal and dual step lengths, Pp,P^ G [0, 1], are chosen such that all the entries of x and z 
remain nonnegative. 

Since we are interested in solving the LP and not the barrier program P{n) for a particular /x, rather 
than taking many Newton steps to approach the solution to P{n), we reduce the value of fi each time 
a Newton step is taken, so that barrier program gives a better approximation of the LP. A reasonable 
updating rule for /j, is to make it proportional to the duality gap ga = {x^y z^, that is 

/ = (28) 

The primal-dual path-following algorithm described above will iterate until the duality gap becomes 
sufficiently small; i.e. {x^y z^ < e. It has been shown that with a proper choice of the step lengths, this 
algorithm takes O[^log{eo/e)) to reduce the duality gap from eo to e. 

In order to initiahze the algorithm, we need some feasible > 0, and z° > 0. Obtaining such an 
initial point is nontrivial, and is usually done by introducing a few dummy variables, as well as a few 
rows and columns to the constraint matrix. This may not be desirable for a sparse LP, since the new rows 
and columns will not generally be sparse. Furthermore, if the Newton directions are computed based on 
the feasibility assumption; i.e. that ri, = and Vc = 0, round-off errors can cause instabilities due to 
the gradual loss of feasibiUty. As an alternative, an infeasible variation of the primal-dual path-following 
algorithm is often used, where any x° > 0, y^, and z° > can be used for initialization. This algorithm 
will simultaneously try to reduce the duality gap and the primal-dual feasibility gap to zero. Consequently, 
the termination criterion will change: we stop the algorithm if [x^Yz'^ < e, ||r(,|| < 5p, and ||rc|| < 6d- 

C. Computing the Newton Directions: Preconditioned Conjugate Gradient Method 

The most complex step at each iteration of the interior-point algorithm in the previous subsection is 
to solve the "normal" system of linear equations in (24). While these equations were derived for the 
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primal-dual path-following algorithm, in most other variations of interior-point methods, we encounter 
linear systems of similar forms, as well. 

Various algorithms for solving linear systems fall into two main categories of direct methods and 
iterative methods. While direct methods, such as Gaussian ehmination attempt to solve the system in 
a finite number of steps, and are exact in the absence of rounding errors, iterative methods start from 
an initial guess, and derive a sequence of approximate solutions. Since the constraint matrix AD^A^ 
in (24) is symmetric and positive definite, the most common direct method for solving this problem is 
based on computing the Cholesky decomposition of this matrix. However, this approach is inefficient 
for large-scale sparse problems, due to the computational cost of the decomposition, as well as loss of 
sparsity. Hence, in many LP problems, e. g. network flow linear programs, iterative methods such as the 
conjugate gradient (CG) method [18] are preferred. 

Suppose we want to find the solution x* to a system of hnear equations given by 

Qx = w, (29) 

where Q is a q x q symmetric positive definite matrix. Equivalently, x* is the unique minimizer of the 
functional 

f{x) = ^x'^Qx - nFx. (30) 

We call two nonzero vectors, u,v ^ TZ'^, Q-conjugate if 

u^Qv = 0. (31) 

The CG method is based on building a set of Q-conjugate basis vectors hi,. . . ,hq, and computing the 
solution X* as 

X* = aihi, . . . ,aqhq, (32) 

where ak = . Hence, the problem becomes that of finding a suitable set of basis vectors. In the CG 

method, these vectors are found in an iterative way, such that at step k, the next basis vector hk is chosen 
to be the closest vector to the negative gradient of f{x) at the current point x^, under the condition that 
it is Q-conjugate to hi,... , hk-i- For a more comprehensive description of this algorithm, the reader is 
referred to [19]. 

While in principle the CG algorithm requires q steps to find the exact solution x*, sometimes a much 
smaller number of iterations provides a sufficiently accurate approximation to the solution. The distribution 
of the eigenvalues of the coefficient matrix Q has a crucial effect on the convergence behavior of the 



February 4, 2009 



DRAFT 



21 



CG method (as well as many other iterative algorithms). In particular, it is shown that [19, Chapter 6] 



-x^\\q<2[ ^ \ ]%.*-A\q. (33) 
Vk(Q) + 1 



where ||a;||Q = {x^Qx) and k,{Q) is the spectral condition number of Q, i.e. the ratio of the maximum 
and minimum eigenvalues of Q. Using this result, the number of iterations of the CG method required 
to reduce — x'^H by a certain factor from its initial value can be upper-bounded by a constant times 
We henceforth call a matrix Q ill-conditioned, in loose terms, if CG converges slowly in solving 

(29). 

In the interior-point algorithm, the spectral behavior of Q = AD'^A^ changes as a function of the 
diagonal elements di, . . . , dg, of D, which are, as described in the previous subsection, the square roots of 
the ratios between the primal variables {xi\ and the dual slack variables {zi\. In Fig. 3, the evolution of 
the distributions of {xj}, {zi], and {di} through the iterations of the interior-point algorithm is illustrated 
for an LP subproblem of an MALP decoding instance. We can observe in this figure that Xi and zi are 
distributed in such a way that the product XiZi is relatively constant over dW i = 1, . . . ,q. This means 
that, although the path-following algorithm does not completely solve the barrier problems defined in 
rV-B, the condition (20) is approximately satisfied for all i. A consequence of this, which can also be 
observed in Fig. 3, is that 

di K, -^Xi, y i = l,...,q. (34) 

As the iterates of the interior-point algorithm become closer to the solution and approaches zero, many 
of the diS take very small or very large values, depending on the value of the corresponding Xj in the 
solution. This has a negative effect on the spectral behavior of Q, and as a result, on the convergence of 
the CG method. 

When the coefficient matrix Q of the system of linear equations is ill-conditioned, it is common to use 
preconditioning. In this method, we use a symmetric positive-definite matrix M as an approximation of 
Q, and instead of (29), we solve the equivalent preconditioned system 

M-^Qx = M-^w. (35) 

We hence obtain the preconditioned conjugate gradient (PCG) algorithm, summarized as Algorithm 4. 

In order to obtain an efficient PCG algorithm, we need the preconditioner M to satisfy two require- 
ments. First, M~^Q should have a better spectral distribution than Q, so that the preconditioned system 
can be solved faster than the original system. Second, it should be inexpensive to solve Mx = z, since 
we need to solve a system of this form at each step of the preconditioned algorithm. Therefore, a natural 
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Fig. 3. The parameters di, Xi, and zu for i = 1, ... ,q at four iterations of the interior-point method for an LP subproblem 
of MALP decoding with n = 1920, p = 627, q = 2547. The variable indices, i, (horizontal axis) are permuted to sort di in 
increasing order. 



approach is to design a preconditioner which, in addition to providing a good approximation of Q, has 
an underlying structure that makes it possible to solve Mx = z using a direct method in hnear time. 

One important application of the PCG algorithm is in interior-point implementations of LP for minimum- 
cost network flow (MCNF) problems. For these problems, the constraint matrix A in the primal LP 
corresponds to the node-arc adjacency matrix of the network graph. In other words, the LP primal 
variables represent the edges, each constraint is defined for the edges incident to a node, and the diagonal 
elements, di,... ,dq, of the diagonal matrix D can be interpreted as weights for the q edges (variables). 
A common method for designing a preconditioner for AD'^A^ is to select a set M of p columns of A 
(edges) with large weights, and form M = AmD\^Aj^, where the subscript M. for a matrix denotes a 
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Algorithm 4 Preconditioned Conjugate Gradient (PCG) 

1: Compute an initial guess for the solution; 

2: = w- Qx^; 

3: Solve Mz^ = r°; 

4: = zO; 

5: for i = 0, . . . , until convergence do 

6: r = Qh'; 

7: ai = {z'fry{h')^l'; 

8: = + 0!*/i*; 

9: 7-*+i = — a^l^; 

10: Solve Mz^+^ = r^+^ 

11: = (z^+i)^r^+V(zOV; 

12: h^+^ = z^+^ + v^K-; 

13: end for 



matrix consisting of the columns of the original matrix with indices in M.. 

It is known that at a non-degenerate solution to an MCNF problem, the nonzero variables (i.e., the 
basic variables) correspond to a spanning tree in the graph. This means that, when the interior-point 
method approaches such a solution, the weights of all the edges, except those defining this spanning tree, 
will go to zero. Hence, a natural selection for M. would be the set of indices of the spaiming tree with 
the maximum total weight, which results in the maximum-weight spaiming tree (MST) preconditioner. 
Finding the maximum-weight spanning tree in a graph can be done efficiently in linear time, and besides, 
due to the tree structure of the graph represented by Am, the matrix M can be inverted in linear time 
as well.^ The MST has been observed in practice to be very effective, especially at the latter iterations 
of the interior-point method, when the operating point is close to the final solution. 

V. Preconditioner Design for LP Decoding 

Our framework for designing an effective preconditioner for LP decoding, similar to the MST pre- 
conditioner for MCNF problems, is to find a preconditioning set, M. C {1, . . . ,^}, corresponding to p 

^Throughout the paper, we refer to solving a system of linear equations with coefficient matrix M, in loose terms, as inverting 
M, although we do not explicitly compute . 



February 4, 2009 



DRAFT 



24 



columns of A and D, resulting in p x p matrices A_\4 and Dj^, such that M = Aj^D'j^A'jl^ is both 
easily invertible and a good approximation of Q = AD^A^ . To satisfy these requirements, it is natural 
to select M to include the variables with the highest weights, {di}, while keeping Am and Dm full 
rank and invertible in 0{q) time. Then, the solution z^~^^ to Mz'^'^^ = r*"*"^ in the PCG algorithm can 
be found by sequentially solving Am/i = r'+\ 1)^/2 = fi, and Aj^z''^'^ = j^, for /i, /2, and 
respectively. 

We are interested in having a graph representation for the constraints and variables of a hnear program 
of the form (16) in the LP decoding problem, such that the selection of a desirable M. can be interpreted 
as searching for a subgraph with certain combinatorial structures. 

Definition 2: Consider an LP of the form (16) with p constraints and q variables, where x^+i, . . . , 
are slack variables. The extended Tanner graph of this LP is a bipartite graph consisting of q variable 
nodes and p constraint nodes, such that variable node % is coimected to constraint node i if xi is involved 
in the jth constraint; i.e., Ai^j is nonzero. 

For the hnear programs in the MALP decoding algorithms, since each constraint is derived from 
a unique check node of the original Taimer graph, the extended Taimer graph will be a subgraph of 
the Taimer graph, with the addition of q degree- 1 (slack) variable nodes, each coimected to one of the 
constraint nodes. In general, for an iteration of MALP decoding of a code with an m x n parity-check 
matrix, the extended Tanner graphs would contain p <m constraint nodes, n variable nodes corresponding 
to the standard variables (bit positions), and p slack variable nodes. As extended Tanner graphs are special 
cases of Tanner graphs, they inherit all the combinatorial concepts defined for Tanner graphs, such as 
stopping sets. A small example of an extended Tanner graph is given in Fig. 4. 

A. Preconditioning via Triangulation 

For a sparse constraint matrix. A, a sufficient condition for Am and to be invertible in 0{q) time 
is that Am can be made upper or lower triangular, with nonzero diagonal elements, using column and/or 
row permutations. We call a preconditioning set M. that satisfies this property a triangular set. Once an 
upper- (lower-) triangular form A^ of Am is found, we start from the last (first) row of A^, and, by 
taking advantage of the sparsity, solve for the variable corresponding to the diagonal element of each 
row recursively in 0(1) time. It is not difficult to see that there always exists at least one triangular set 
for any LP decoding problem; one example is the set of columns corresponding to the slack variables, 
which results in a diagonal Am- 

As a criterion for finding the best approximation AmDJ^AjI^ of AD^A^, we search for the triangular 
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Fig. 4. An extended Tanner graph for an LP problem with n = 4, p = 3, and q = 7. 



set that contains the columns with the highest weights, di. One can consider different strategies of 
scoring a triangular set from the weights of its members, e.g., the sum of the weights, or the largest 
value of minimum weight. It is interesting to study as a future work whether given any such metric, the 
"maximum-weight" (or optimal) triangular set can be found in polynomial time. However, in this work, 
we propose a (suboptimal) greedy approach, which is motivated by the properties of the LP decoding 
problem. 

The problem of bringing a parity-check matrix into (approximate) triangular form has been studied by 
Richardson and Urbanke [20] in the context of the encoding of LDPC codes. The authors proposed a 
series of greedy algorithms that are similar to the peeling algorithm for decoding in the binary erasure 
channel: repeatedly select a nonzero entry (edge) of the matrix (graph) lying on a degree- 1 column or 
row (variable or check node), and remove both the column and row of this entry from the matrix. They 
showed that parity-check matrices that are optimized for erasure decoding can be made almost triangular 
using this greedy approach. It is important to note that this combinatorial approach only relies on the 
placement of the nonzero entries of the matrix, rather than their values. 

The fact that the constraint matrices of the LP problems in MALP decoding have structure similar to 
the corresponding parity-check matrix motivates the use of a greedy algorithm analogous to those in [20] 
for triangulating the matrix A. However, this problem is different from the encoding problem, in that we 
are not merely interested in making A triangular, but rather, we look for the triangular submatrix with 
the maximum weight. In fact, as mentioned earher, finding one triangular form of A is trivial, due to 
the presence of the slack variables. Here, we present three greedy algorithms to search for the MTS, one 
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of which is related to the algorithms of Richardson and Urbanke. Throughout this section, we will also 
refer to the outputs of these (suboptimal) greedy algorithms, in loose terms, as the MTS, although they 
may not necessarily have the maximum possible weight. 

1 ) Incremental Greedy Search for the MTS: Although an ideal preconditioning set would contain the 
q columns of the matrix that have the q highest weights, in reality, the square submatrix of A composed 
of these q columns is often neither triangular nor full rank. In the incremental greedy search for the 
MTS, we start by selecting the highest- weight column, and try to expand the set of selected columns by 
giving priority to the columns of higher weights, while maintaining the property that the corresponding 
submatrix can be made lower-triangular by column and row permutations. 

Let iS be a set of selected columns from A, where \S\ < p. In order to check whether the submatrix 
As can be made lower-triangular by column and row permutations, we can treat the variable nodes 
corresponding to 5 in the Tanner graph as erased bits, and use the peeling algorithm to decode them 
in 0{q) time. For completeness, this process, which we call the Triangulation Step, is described in 
Algorithm 5. 



Algorithm 5 Triangulation Step 



1: Input: The set S with \S\ = s < p, and the matrix A; 

2: Output: An s x s lower-triangular submatrix A^, if possible; 

3: Initialization: A <— As, and initialize col and row as zero-length vectors; 

4: for A; = 1 to s do 

5: if the minimum row degree in A is not one then As cannot be made lower-triangular by 

permutation; Declare Failure and exit the algorithm; 
6: Select any degree- 1 row j from A, and let i be the index of the column that contains the only 

nonzero entry of row j; 

col 



col 



row 



row 
j 



8: Set all the entries in column i and row j of A to zero; 
9: end for 

10: Form A^ by setting A^. . = Ascoh,rowp V j e {1, . . . s}; 



Using the Triangulation Step as a subroutine, the incremental greedy search method, given by Algo- 
rithm 6, first sorts the columns according to their corresponding weights, di (or, alternatively, Xi), and 
initiaUzes the preconditioning set, M, as an empty set. Starting with the highest-weight column and 
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going down the sorted list of column indices, it adds each column to M if the submatrix corresponding 
to the resulting set can be made lower triangular using the Triangulation Step. 

Algorithm 6 Incremental Greedy Search for the MTS 
1: Input: p X q constraint matrix A, and the set of column weights, cZi . . . dq] 

2: Output: A triangular set M and the p x p lower-triangular matrix A^; 

3: Initialization: ^ 0, i ^ 0; 

4: Sort the column indices {1, . . . , g} according to their corresponding weights, di, in decreasing order, 

to obtain the permuted sequence tti, . . . , tt^, such that cZtti > ■ ■ ■ > cJtt, ; 

5: while i < q and \M\ < ^? do 

6: M^MU {iTi}; 

1: if the Triangulation Step can bring the submatrix As into the lower-triangular form A^ then 

8: M^S,A%^ A%; 

9: end if 

10: end while 



We claim that, due to the presence of the slack columns in A, Algorithm 6 will successfully find a 
triangular set M. of p columns; i.e., it exits the while-loop (lines 5-10) only when \M\ = p. Assume, 
on the contrary, that the algorithm ends while |A^| < p, so that the matrix Am is a p x \M\ lower- 
triangular matrix. This means that if we add any column A; G {!,... ,q}\M. to M., it caimot be made 
lower triangular, since otherwise, column k would have already been added to |A^| when tt^ = A; in the 
while-loop."* However, this clearly caimot be the case, since we can produce di p x p lower-triangular 
matrix A^, simply by adding the columns corresponding to the slack variables of the last p — |A^| rows 
of Am- Hence, we conclude that \M\ = p. 

2 ) Column-wise Greedy Search for the MTS: Algorithm 7 is a column-wise greedy search for the 
MTS. It successively adds the index of the maximum-weight degree-1 column of A to the set M., and 
eliminates this column and the row that shares its only nonzero entry. Matrix A initially contains p 
degree-1 slack columns, and at each iteration, one such column will be erased. Hence, there is always 
a degree-1 column in the residual matrix, and the algorithm proceeds until p columns are selected. The 
resulting preconditioning set will correspond to an upper-triangular submatrix Am- 

''Note that if any set S of columns can be made lower triangular, any subset of these columns can be made lower triangular, 
as well. 
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Algorithm 7 Column- wise Greedy Search for the MTS 



1: Input: p X g constraint matrix A, and the set of column weights c?i, . . . , cZ^; 
2: Output: A triangular set M and the upper-triangular matrix A^; 
3: Initialization: A A, M ^, and initiaUze col and row as zero-length vectors; 
4: Define and form VSQl as the index set of all degree- 1 columns in A; 



5: 



for A; = 1 to p do 



6: 



Let i G VSQl be the index of the (degree-1) column of A with the maximum weight, di, and let 



7: 




8: Set all the entries in row j of A (including the only nonzero entry of column i) to zero; 
9: Update VSQl from the residual matrix, A; 
10: end for 



3) Row-wise Greedy Search for the MTS: Algorithm 8 uses a row-wise approach for finding the MTS. 
In this method, we look at the set of degree-1 rows, add to M. the indices of all the columns that 
intersect with these rows at nonzero entries, and ehminate these rows and columns from A. Unhke the 
column- wise method, it is possible that, at some iteration, these is no degree-1 row in the matrix. In this 
case, we repeatedly ehminate the lowest- weight column, until there is one or more degree-1 rows. 

In addition to this difference, the number of columns in M. by the end of this procedure is often 
slightly smaller that p. Hence, we perform a "diagonal expansion" step at the end, where p — \M.\ 
columns corresponding to the slack variables are added to M, while keeping it a triangular set. A 
problem with this expansion method is that, since the algorithm does not have a choice in selecting the 
slack variables added in this step, it may add columns that have very small weights. 

Let A^^ be the triangular submatrix obtained before the expansion step. As an alternative to diagonally 
expanding A'^^ by adding slack columns, we can apply a "triangular expansion." In this method, we 
form a matrix A consisting of the columns of A that do not share any nonzero entries with the rows 
in vector row, and apply a column-wise or row-wise greedy search to this matrix in order to obtain a 
high-weight lower-triangular submatrix A^ . This requirement for forming A ensures that the resulting 
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Form A^ by setting A^.. = Acou,rowp V j G {1, ...p}; 
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Algorithm 8 Row-wise Greedy Search for the MTS 



1: Input: p X q constraint matrix A, and the set of column weights c?i, . . . , cZ^; 

2: Output: A triangular set M and the lower-triangular matrix ; 

3: Initialization: A A, M ^, and initialize col and row as zero-length vectors; 

4: Define and form VSQl as the index set of all degree- 1 rows in A; 

5: while A is not all zeroes do 

6: if \V£gi\ > then 

7: Let j € VSQl be any degree- 1 row of A, and i be the index of the column that contains the 
only nonzero entry of this row; 

, . , . . , col row 

8: M ^ M U I, col ^ , row ^ ; 

« J [ i _ 

9: Set all the entries in column i of A (including the only nonzero entry of row j) to zero, and 

update VSgi; 
10: else 

11: Let i be the index of the nonzero column of A with the minimum weight, di. Set all the entries 

in column i to zero, and update VSQl; 
12: end if 
13: end while 

14: Diagonal Expansion: For each row j of A that is not represented in row, append j to row, and 

append i = j + n, i.e., the index of the corresponding slack column, to both col and M; 
15: Form A'^ by setting A^. = A^oi^rowj, V i,j € {1, ...p}; 



triangular submatrices and A^^ can be concatenated as 



A 



A 
Ml 

B A 





A 
M2 



(36) 



to form a larger triangular submatrix of A. This process can be continued, if necessary, until a square 
p X p triangular matrix A'^ is obtained, although our experiments indicate that one expansion step is 
often sufficient to provide such a result. It is easy to see that this approach is potentially stronger than 
the diagonal expansion in Algorithm 8, since it has the diagonal expansion as a special case. 
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B. Implementation and Complexity Considerations 

To compute the running time of Algorithm 6, note that while Step 4 has Oiyqiogq) complexity, the 
computational complexity of the algorithm is dominated by the Triangulation Step. This subroutine has 
0{q) complexity, and is called 0{q) times in Algorithm 6, which makes the overall complexity 0{q^). 
An interesting problem to investigate is whether we can simplify the triangulation process in Une 7 to 
have subhnear complexity by exploiting the results of the previous round of triangulation, as stated in 
the following open problem concerning erasure decoding: 

Open Problem: Consider the Tanner graph corresponding to an arbitrary LDPC code of length n. 
Assume that a set £ of bits are erased, and £ does not contain a stopping set in the Tanner graph. Thus, 
the decoder successfully recovers these erased bits using the peeling algorithm (i.e., the triangulation 
Algorithm 5). Now, we add a bit i to the set of erased bits. Given j, £, and the complete knowledge of 
the process of decoding £, such as the order in which the bits are decoded, and the check nodes used, 
is there an o(n) scheme to verify if f U {i} can be decoded by the peehng algorithm? 

In addition this potential simplification, it is possible to make a number of modifications to Algorithm 
6 in order to reduce its complexity. Let s be the size of the smallest stopping set in the extended Tanner 
graph of A, which means that the submatrix formed by any s — 1 columns can be made triangular. 
Then, instead of initializing J\A to be the empty set, we can immediately add the s — 1 highest-weight 
columns to M., since we are guaranteed that Am can be made triangular. Moreover, at each iteration of 
the algorithm, we can consider k > 1 column to be added to Al, in order to reduce the number of calls 
to the triangulation subroutine. The value of k can be adaptively selected to make sure that the modified 
algorithm remains equivalent to Algorithm 6. 

To assess the complexity of Algorithm 7, we need to examine Steps 8 and 11 that involve column 
or row operations, as well as Steps 4, 6, and 9 that deal with the list of degree- 1 columns. Since there 
is an 0(1) number of nonzero entries in each column or row of A, running Step 8 p times (due to the 
for-loop), and deriving A^ from A in Step 11 each take 0{q) time. However, one should be careful in 
selecting a suitable data structure for storing the set VSQl, since, in each cycle of the for-loop, we need 
to extract the element with the maximum weight, and add to and remove from this set an 0(1) number 
of elements. By using a binary heap data structure [21], which is implementable as an array, all these 
(Steps 6 and 9) can be done in 0{\ogq) time in the worst case. Also, the initial formation of the heap 
(Step 4) has 0{q) complexity. As a result, the total complexity of Algorithm 7 becomes 0{qlogq). 

Similarly, in Algorithm 8, we need a mechanism to extract the minimum-weight member of the set 



February 4, 2009 



DRAFT 



31 



of remaining columns. While the heap structure mentioned above works well here, since no column is 
added to the set of remaining columns, we can alternatively sort the set of all columns by their weights 
as a preprocessing step with 0{qlogq) complexity, thus making the complexity of the while-loop Unear. 
Since the complexity of steps 15 (diagonal expansion) and 16 are linear, as well, the total running time 
of Algorithm 8 will be 0{qlogq). 

The process of finding a triangular preconditioner is performed at each iteration of the interior-point 
algorithm. Since the values of primal variables, {xj}, do not substantially change in one iteration, we 
expect the maximum-weight triangular set at each iteration to be relatively close to that in the previous 
iteration. Consequently, an interesting area for future work is to investigate modifications of the proposed 
algorithms, where the knowledge of the MTS in the previous iteration of the interior-point method is 
exploited to improve the complexity of these algorithms. 

VI. Analysis of the MTS Preconditioning Algorithms 
A. Performance Analysis 

It is of great interest to study how the proposed algorithms perform as the problem size goes to infinity. 
We expect that a number of asymptotic results similar to those of Richardson and Urbanke in [20] can 
be derived, e.g., showing that the greedy preconditioner designs perform well for capacity-approaching 
LDPC ensembles. However, since one of the main advantages of LP decoding over message-passing 
decoding is its geometrical structure that facilitates the analysis of its performance in the finite-length 
regime, in this work we focus on studying the proposed algorithms in this regime. 

We will study the behavior of the proposed preconditioner in the later iterations of the interior-point 
algorithm, when the iterates are close to the optimum. This is justified by the fact that, as the interior- 
point algorithm approaches the boundary of the feasible region during its later iterations, many of the 
primal variables, Xi, and the dual slack variables, Zi, approach zero, thus deteriorating the conditioning 
of the matrix Q = AD'^A^. This is when a precoditioner is most needed. In addition, we can obtain 
some information about the performance of the preconditioner in the later iterations by focusing on the 
optimal point of the feasible set. 

Consider an LP problem in the augmented form (16) as part of ALP or MALP decoding, and assume 
that it has a unique optimal solution (although parts of our analysis can be extended to the case with 
non-unique solutions). We denote by the triplet {x*,y*,z*) the primal-dual solution to this LP, and by 
{x, y, z) an intermediate iterate of the interior-point method. We can partition the set of the q columns 
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of A into the basic set 

B = {i\x* > 0} (37) 

and the nonbasic set 

A/'= {i|x* = 0}. (38) 

For brevity, we henceforth refer to the columns of the constraint matrix A corresponding to the basic 
variables as the "basic columns." It is not difficult to show that, for an LP with a unique solution, the 
number of basic variables, i.e., \B\, is at most p. To see this, assume that I of the standard variables 
x\. . .x^ are nonzero, which means that n — I box constraints of the form Xj > are active at x*. 
Since x* is a vertex defined by at least n active constraints in the LP, we conclude that at least I parity 
inequalities must be active at x*, thus leaving at most p — I nonzero slack variables. We call the LP 
nondegenerate if |B| = p, and degenerate if \B\ < p. 

It is known that the unique solution {x*,y*,z*) is "strictly complementary" [22], meaning that for 
any i G {1, . . . ,q} either x* = and z* > 0, or x* > and z* = 0. Remembering from (27) that 
di = \Jxilzi, as the iterates of the interior-point algorithm approach the optimum, i.e., /U given in (28) 
goes to zero, we will have 

{oo if i G ;S, 
(39) 
if z GAT, 

Therefore, towards the end of the algorithm, the matrix Q = AD^A^ will be dominated by the columns 
of A and D corresponding to the basic set. Hence, it is highly desirable to select a preconditioning set 
that includes all the basic columns, i.e., B C M, in which case AmD'^A'j^ becomes a better and better 
approximation of Q, as we approach the optimum of the LP. In the rest of this subsection, we will show 
that, when the solution to the LP is integral and /x is sufficiently small, this property can be achieved by 
low complexity algorithms similar to Algorithms 7 and 8. 

Lemma 2: Consider the extended Tanner graph T'^ for an LP subproblem LP^ of MALP decoding. 
If the primal solution to LP^ is integral, the set of variable nodes corresponding to the basic set, whose 
definition is based on the augmented form (16) of the LP, does not contain any stopping set. 

Proof: Consider an erasure decoding problem p^^c on T^, where the basic variable nodes are 
erasures. We prove the lemma by showing that the peeUng (or LP) decoder can successfully correct these 
erasures. 

We denote by u* and x* the solutions to the primal LP in the (original) standard form (9) and in the 
augmented form (16). From part c) of Theorem 2, we know that u* is also the solution to a full LP 
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decoding problem LPD^ with the LLR vector 7 and the Tanner graph comprising the standard variable 
nodes and the active check nodes, Jact- 

We partition the basic set B into Bgtd ^nd Bgik, the sets of basic standard variables and basic slack 
variables, respectively. We also partition the set of check nodes in into Jact and Jinact, the sets of 
check nodes that generate the active and inactive parity inequaUties of LP^, respectively. Clearly, the 
neighbors of the slack variable nodes in Bgik are the check nodes in Jinacu since an inactive parity 
inequaUty has, by definition, a nonzero slack. 

Step 1: We first show that, even if we remove the check nodes in Jinact from T^, the set of basic 
standard variable nodes, Bstd', does not contain a stopping set. 

Remembering the conversion of the LP in the standard form (9) with inequality constraints to the 
augmented form (16), we can write 

Bstd = e X I (7i > , < = 1) or (7i < , v* = 0)}. (40) 

Using, as in Theorem 4, the notation u for the result of bit-based hard decision on 7, one can see that 
Bstd is identical to £, the set of positions where u* and u differ. Hence, knowing that u* is the solution 
to an LP decoding problem, and using Theorem 4, we conclude that the set Bgtd does not contain a 
stopping set in the Taimer graph that only includes the check nodes in Jact- 

Step 2: Now we return to T^, and consider solving p^^C', where all the basic variables are erasures, 
using the peeling algorithm. Since the slack variables which are basic are connected only to the inactive 
check nodes, we know from Step 1 that the erased variables Bgtd can be decoded by only using the active 
check nodes Jact- Once these variable nodes are peeled off the graph, we are left with the basic slack 
variable nodes, each of which is connected to a distinct check node in Jinact- Therefore, the peeling 
algorithm can proceed by decoding all of these variables. This completes the proof. ■ 

Lemma 2 shows that, under proper conditions, the submatrix A of A formed by only including 
the columns corresponding to the basic variables can be made lower triangular by column and row 
permutations. This suggests that looking for a maximum- weight triangular set is a natural approach for 
designing a preconditioner in MALP decoding. In particular, the following theorem shows that, under 
the conditions of Lemma 2, the incremental greedy Algorithm 6 indeed finds a preconditioning set that 
includes all such columns. 

As the interior-point algorithm progresses, the basic variables approach 1, while the nonbasic variables 
approach zero. Hence, referring to (39), we see that after a large enough number of iterations, the \B\ 
highest-weight columns of A will correspond to the basic set B. The following theorem shows that two 
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of the proposed algorithms indeed find a preconditioning set that includes all such columns. 

Theorem 5: Consider an LP subproblem LP*^ of an MALP decoding problem. If the primal solution 
to LP^ is integral, at the iterates of the interior-point method that are sufficiently close to the solution, 
both the Incremental Greedy Algorithm and the Row-wise Greedy Algorithm can successfully find a 
triangular set that includes all the columns corresponding to the basic set. 

Proof: As the interior-point algorithm progresses, the weights di corresponding to the basic variables 
approach oo, while the weights of nonbasic variables approach zero. Hence, when jj, becomes sufficiently 
small, the columns corresponding to the basic set, B will be the \B\ highest- weight columns of A, and 
according to Lemma 2, the matrix Aq consisting of these columns can be made triangular, provided that 
the solution to LP^ is integral. 

In view of this result, the proof of the claim for the incremental greedy algorithm becomes straigh- 
forward: The preconditioning set M. continues to grow by one member at each iteration, at least until it 
includes all the \B\ highest-weight (i.e., basic) columns. 

To prove that the triangular set M. given by the row-wise greedy algorithm includes the basic set, 
as well, it is sufficient to show that none of the basic columns will be erased from A (i.e., become all 
zeroes) in line 1 1 of Algorithm 8. Assume that, at some iteration, a column i is selected in line 1 1 to be 
erased. Column i has the minimum weight among the nonzero columns currently in A. Therefore, if i is 
a basic column and ji is small enough, all the other nonzero columns are basic columns, as well, since 
the basic columns are the \B\ highest-weight columns of A. This means that A could be made triangular, 
without running out of degree- 1 rows and having to erase column i. So, column i cannot be basic. ■ 

Remark 2: The proof above suggests that Theorem 5 can be stated in more general terms. For any 
s G {1, ... jQ'}, let 5 be a set consisting of the s highest- weight columns of A. Then, if the set of 
variable nodes corresponding to iS in the (extended) Tanner graph does not contain a stopping set, that 
is. As can be made triangular by row and column permutations, then the preconditioning sets found by 
Algorithms 6 and 8 both contain S. 

The assumption that the solution is integral does not hold for all LPs that we solve in adaptive LP 
decoding. On the other hand, in practice, we are often interested in solving the LP exactly, only when LP 
decoding finds an integral solution (i.e., the ML codeword). This, of course, does not mean that in such 
cases every LP subproblem solved in the adaptive method has an integral solution. However, one can 
argue heuristically that, if the final LP subproblem has an integral solution, the intermediate LPs are also 
very likely to have an integral solution. To see this, remember from Theorem 2 that each intermediate 
LP problem that is solved in adaptive LP decoding is equivalent to a full LP decoding that uses a subset 
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of the check nodes in the Tanner graph. Now, if LP decoding with the complete Tanner graph has an 
integral solution, it is natural to expect that, after removing a subset of check nodes, which can also 
reduce the number of cycles, the LP decoder still very likely to find an integral solution. 

B. Performance Simulation 

We simulated the LP decoding of (3, 6)-regular LDPC codes on the AWGN channel using the MALP- 
A algorithm and our sparse implementation of the path-following interior-point method. We have shown 
earlier that, as interior-point progresses, the matrix AD'^AJ^ that needs to be inverted to compute the 
Newton steps becomes more and more ill-conditioned. We have observed that this problem becomes 
more severe in the later iterations of the MALP-A algorithm, where the LP problem is larger and more 
degenerate due to the abundance of active constraints at the optimum of the problem. 

In Figs. 5-8, we present the performance results of the PCG method for four different systems of 
linear equations in the form of (24), solved in the infeasible primal-dual path-following interior-point 
algorithm, using the preconditioners designed by greedy Algorithms 6-8.^ In these simulations, we used 
a randomly-generated (3, 6)-regular LDPC code of length 2000, where the cycles of length four were 
removed. The performance of the PCG algorithm is measured by the behavior of the relative residual 
error ||^' Hi/ II ""^11 2' where and w are defined in Algorithm 4, as a function of the iteration number i of 
the PCG algorithm. 

In Figs. 5 and 6, we considered solving (24) in two different iterations of the interior-point algorithm 
for solving an LP problem. This LP problem was selected at the 6th iteration of an MALP decoding 
problem at SNR =1.5 dB, and the solution to the LP was integral. The constraint matrix A for this LP 
had 713 rows and 2713 columns, and we used the PCG algorithm to compute the Newton step. Fig. 5 
corresponds to finding the Newton step at the 8th iteration of the interior-point algorithm. In this scenario, 
the duality gap = x^z was equal to 48.6, and the condition number k{Q) of the problem was equal 
to 3.46 X 10^. We have plotted the residual error of the CG method without preconditioning, as well as 
the PCG method using the three proposed preconditioner designs. For this problem, except during the 
first 10-15 iterations, the behaviors of the three preconditioned implementations are very similar, and all 
significantly outperform the CG method. 

In Fig. 6, we solved (24) at the 18th iteration of the same LP, where the interior-point is much closer 
to the solution, with = 0.22 and k{Q) = 2.33 x 10^. In this problem, the convergence of the CG 

^In all the simulations of the Row-wise Greedy Search (Algorithm 8) that we present in this section, we have used a diagonal 
expansion, rather than a triangular expansion, as described in Subsection V-A. 
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Fig. 5. The progress of the residual error for different PCG implementations, solving (24) in the 8th iteration of the interior- 
point algorithm, in an LP with an integral solution. The constraint matrix A has 830 rows and 3830 columns, gd = 48.6, and 
k{Q) = 3.46 X 10*. 



method is very slow, so that in 200 iterations, the residual error does not get below 0.07. The PCG 
method with incremental greedy preconditioning, reaching a residual error of 10"*^ in 40 iterations, has 
the fastest convergence, followed by the column-wise greedy preconditioned 

To study the performance of the algorithms when the LP solution is not integral, we considered an 
LP from the 6th iteration of an MALP-A decoding problem at SNR = 1.0 dB, where the solution was 
fractional. The matrix A had 830 rows and 3830 columns. Fig. 7 corresponds to the 8th iteration of 
the interior-point algorithm, with g^, = 46.4 and k{Q) = 2.03 x 10^, while Fig. 8 corresponds to the 
18th (penultimate) iteration, with = 0.155 and k{Q) = 2.61 x 10^. These parameters are chosen 
such that the scenarios in these two figures are respectively similar to those in Figs. 5 and 6, the main 
difference being that the decoding problem now has a fractional solution. We can observe that, while the 
performance of the CG method is very similar in Fig. 5 and Fig. 7, as well as in Fig. 6 and Fig. 8, the 
preconditioned implementations have slower convergence when the LP solution is fractional. In particular, 
in Fig. 8, the row-wise greedy preconditioner does not improve the convergence of the CG method, and 
is essentially ineffective. 
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Iteration Number 



Fig. 6. The progress of the residual error for different PCG implementations, solving (24) in the 8th iteration of the interior- 
point algorithm, in an LP with an integral solution. The constraint matrix A has 830 rows and 3830 colurrms, ga = 0.22, and 
k{Q) = 2.33 X 10^ 




Fig. 7. The progress of the residual error for different PCG implementations, solving (24) in the 8th iteration of the interior- 
point algorithm, in an LP with a fractional solution. The constraint matrix A has 830 rows and 3830 columns, ga = 46.4, and 
k{Q) = 2.03 X 10*. 
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Fig. 8. The progress of the residual error for different PCG implementations, solving (24) in the 8th iteration of the interior- 
point algorithm, in an LP with a fractional solution. The constraint matrix A has 830 rows and 3830 columns, gd = 0.155, and 
k{Q) = 2.61 X 10*. 



C. Discussion 

Overall, we have observed that in very ill-conditioned problems, the incremental and the column-wise 
greedy algorithms are significantly more effective than the row-wise greedy algorithm in speeding up 
the solution of the linear system. The better performance of the column-wise approach relative to the 
row- wise approach can be explained by the fact that the former, which searches for degree- 1 columns, 
has more choices at each stage, since the columns of A have lower degrees on average than its rows. 
Besides, while the column-wise is always able to find a complete triangular preconditioning set, the 
row-wise algorithm needs to expand the preconditioning set at the end by adding some slack columns 
that may have very low weights. Considering both the complexity and performance, the column-wise 
search (Algorithm 7) seems to be a suitable choice for a practical implemetation of LP decoding. 

A second observation that we have made in our simulations is that the convergence of the PCG method 
cannot be well characterized just by the condition number of the preconditioned matrix. In fact, we have 
encountered several situations where the preconditioned matrix had a much higher condition number than 
the original matrix, yet it resulted in a much faster convergence. For instance, in the scenario studied 
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in Fig. 8, tlie condition number of the preconditioned matrix M Q for both the column- wise and the 
incremental algorithms was higher than that of Q by factor of 50-100, while these preconditioners still 
improved the convergence compared to the CG method. Indeed, it is believed in the literature that the 
speed of convergence of the CG can typically be better explained by the number of distinct clusters of 
eigenvalues. 

While we studied the interior-point method in the context of MALP decoding, the proposed algorithms 
can also be applied to the LPs that may have more than one constraint from each check node. For instance, 
we have observed that the proposed implementation is also very effective for ALP decoding. However, in 
the absence of the single-constraint property, some of the analytical results we presented may no longer 
be valid. 

Vll. Conclusion 

In this paper, we studied various elements in an efficient implementation of LP decoding. We first 
studied the adaptive LP decoding algorithm and two variations and demonstrated a number of properties 
of these algorithms. Specifically, we proposed modifications of the ALP decoding algorithm that satisfy 
the single-constraint property; i.e., each LP to be solved contains at most one parity inequaUty from each 
check node of the Taimer graph. 

We later studied a sparse interior-point implementation of linear programming, with the goal of 
exploiting the properties of the decoding problem in order to achieve lower complexity. The heart of 
the interior-point algorithm is the computation of the Newton step via solving an (often ill-conditioned) 
system of linear equations. Since iterative algorithms for solving sparse linear systems, including the 
conjugate-gradient method, converge slowly when the system is ill-conditioned, we focused on finding a 
suitable preconditioner to speed up the process. 

Motivated by the properties of LP decoding, we studied a new framework for desiging a preconditioner. 
Our approach was based on finding a square submatrix of the LP constraint matrix which contains the 
columns with the highest possible weights, and at the same time, can be made lower- or upper-triangular 
by column and row permutations, making it invertible in linear time. We proposed a number of greedy 
algorithms for designing such preconditioners, and proved that, when the solution to the LP is integral, 
two of these algorithms indeed result in effective preconditioners. We demonstrated the performance of 
the proposed schemes via simulation, and we observed that the preconditioned systems are most effective 
when the current LP has an integral solution. 

One can imagine various modifications and alternatives to the proposed greedy algorithms for designing 
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preconditioners. It is also interesting to investigate the possibility of finding other adaptive or nonadaptive 
formulations of LP decoding that result in solving the fewest/smallest possible number of LPs, while 
maintaining the single-constraint property. Moreover, there are several aspects of the implementation of 
LP decoding that are not explored in this work. These potential areas for future research include the 
optimum selection of the stopping criteria and step sizes for the interior-point algorithm and the CG 
method, as well as the theoretical analysis of the effect of preconditioning on the condition number and 
the eigenvalue spectrum of the linear system, similar to the study done in [23] for network flow problems. 

Appendix I 
Proof of Theorem 2 

Proof: 

a) To prove the claim, we show that the solution to any linear program LP^ consisting of the n initial 
(single-sided) box inequaUties given by (8) and any number of parity inequalities of the form (6) 
satisfies all the double-sided box constraints of the form < Uj < 1, z G X = {1, . . . , n}. 
For simpUcity, we first transform each variable Uj, i e I, and its coefficient 7^ in the objective 

function, respectively, into a new variable Vi and a new coefficient Aj, where 

Vi = Ui and Aj = 7^ if ji > 0, 

(41) 

Vi = 1- Ui and Aj = - ji if 7^ < 0. 
By this change of variables, we can rewrite LP*' in terms of v. In this equivalent LP, all the 
variables Vi will have nonnegative coefficients Aj in the objective function, and the box constraints 
(8) will all be transformed into inequalities of the form Vi > 0. However, the transformed parity 
inequahties will still have the form 

+ 1' (42) 

although here some of the sets Aj may have even cardinaHty. To prove the claim, it suffices to 
show that the unique solution v'^ to this LP satisfies vf < 1, V z G X. 

Assume, on the contrary, that for a subset of indices £ C X, we have vf > 1, ^ i e JC, and 
0<v'^<l, VzG I\jC. We define a new vector {5*^ as 

vf = l if ieC, 

(43) 

vf = vf if ieI\C. 

Remembering that Aj > 0, ^ i e I, v/e will have X^v'' < }^v^. Moreover, v^ clearly satisfies all 
the double-sided box constraints Q < vf <\, VzGX. We claim that any parity inequality of the 
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form (42) in the LP, which is by assumption satisfied at v^, is also satisfied at v^. To see this, note 
that the first sum in (42) can only either increase or remain constant by moving from to v^, and 
it will be nonnegative at v'^. Moreover, the second sum will remain constant if £ n Bj = 0, or will 
decrease but remain greater than or equal to one if £ n Bj 7^ 0. In both cases, inequality (42) will 
be satisfied at v^. Hence, we have shown that there is a feasible point which has a cost smaller 
than or equal to that of v^. This contradicts the assumption that is the unique solution to the 
LP. Consequently, the solution to the LP should satisfy all the double-sided box constraints. 

b) We need to show that j'-^vf' < j'-u^'^^ for any Q <k < K. This is obvious for ALP decoding, as 
the feasible set of LP'^ contains the feasible set of LP^+'^. For MALP-A and MALP-B, let LP*^ 
be the problem obtained by removing from LP^ a subset (or all) of the parity inequalities that are 
inactive at its solution, u^. As discussed earlier, these inactive inequalities are non-binding, so the 
solution to LP*^ must be u^, as well. Now, LP^^^ is obtained by adding some new (violated) 
constraints to LP*^. Hence, the feasible set of LP*^ strictly contains that of LP'^'^^, which yields 

c) Similar to the proof of [9, Theorem 2]. 

d) Similar to part b), let LP*^ be the LP problem obtained by removing from LP^ all of the parity 
inequalities that are inactive at u^, and remember that is the solution to LP*^, as well. Clearly, 
all the parity inequalities in LP*^ are from check nodes with indices in J^, thus the feasible space 
of LP*^ contains that of LPD^. Hence, it remains to show that vf^, the optimum feasible point 
for LP*^, is also in the feasible space of LPD^. 

Let C {1, . . . , n} be the set of indices of variable nodes that are involved in at least one of 
the parity inequalities in LP*'^ (or, equivalently, check nodes in J^), and let l'^ be the set of the 
remaining indices. According to Corollary 2, all the parity inequaUties from check nodes in j'^ are 
satisfied at u'^. In addition, we can conclude from Corollary 1 that the box constraints for variables 
with indices in l'^ are satisfied, as well. 

Now, for any i G l'^, the variable Ui will be decoupled from all other variables, since it is only 
constrained by a box constraint according to (8). Hence, in the solution such a variable will 

take the value = if 7^ > or = 1 if 7^ < 0.^ Consequently, u'' satisfies all the parity 
inequalities and box constraints of LPD^, and hence is the solution to this LP decoding problem. 

*We assume that 7i 7^ 0, since otherwise, m*^ will not have a unique optimum value, which contradicts the uniqueness 
assimiption on vl' in the theorem. 
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